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Abstract 

The tools of optimal estimation are applied to the study of subgrid models for Large- 
Eddy Simulation of turbulence. The concept of optimal estimator is introduced and 
its properties are analyzed in the context of applications to a priori tests of subgrid 
models. Attention is focused on the Cook and Riley model in the case of a scalar 
field in isotropic turbulence. Using DNS data, the relevance of the (3 assumption is 
estimated by computing (i) generalized optimal estimators and (ii) the error brought 
by this assumption alone. Optimal estimators are computed for the subgrid variance 
using various sets of variables and various techniques (histograms and neural networks) . 
It is shown that optimal estimators allow a thorough exploration of models. Neural 
networks are proved to be relevant and very efficient in this framework, and further 
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usages are suggested. 

1 Introduction 

The principle of Large Eddy Simulations is to solve the evolution equations only for the 
large scales of a turbulent flow. Since the large eddies do not contain all the information 
that would be necessary to compute the future of a given flowQ] , the evolution equations 
for the large eddies are not closed. It is then a generic problem in any type of LES, that 
the unknown terms have to be approximated using known quantities, i.e. quantities 
that can be computed directly using information associated with the resolved field, 
or quantities estimated via additional subgrid variables solution of auxiliary evolution 
equations (a subgrid turbulent stress tensor [21 , a subgrid probability [S] , or a subgrid 
spectrum jl]...). 

The aim of the present paper is to introduce the concept of optimal estimator as a 
tool to estimate the minimal error that a perfect subgrid model based on a given set of 
known large scale quantities (the variables of the model) will generate (the irreducible 
error). This optimal estimator strategy is considered by the authors of the present 
paper as a very helpful process in the field of subgrid modeling, since it provides a way 
of assessing the relevance of the set of variables on which a subgrid model will be built, 
before having to specify the precise form of the model. 

The optimal estimator can be computed numerically if the true subgrid term is 
known, that is to say using the results of a Direct Numerical Simulation (DNS). It 
is therefore a concept that is developed in the framework of what is usually referred 
to as "a priori" test of subgrid models. The first sections of the paper are devoted 
to presenting the method of building the optimal estimator using different techniques. 
The classical technique relies on histograms, and a new and promising technique based 
on neural networks is introduced. It is pointed out in section 7 that neural networks 
are indeed particularly relevant and efficient to build the optimal estimator when the 
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number of parameters to be included in the subgrid model increases. Some classical 
results in optimal estimation will also be exposed in the two following sections. 

In the second part of the paper (sections 4 to 7), the interest of optimal estimators is 
illustrated in the particular case of the subgrid modeling problem for a simple reaction 
term depending on a single passive scalar in isotropic homogeneous turbulence. The 
procedure is used to assess the performances of the popular Cook and Riley modellS], 
which uses a /3-distribution as a presumed form of the Filtered Density Function (FDF). 

It is shown how it allows to distinguish between the various sources of errors in the 
model. Once the error brought by each assumption contained in the model has been 
estimated, it is easy to identify at which level improvements could be made. 

We will particularly compare the error associated with the choice of the (3 distri- 
butions, with the one resulting from the use of a sub- model to estimate the subgrid 
variance. We will address the problem of the relevancy of the different variables which 
have been used in the literature^ E] to express the subgrid variance. 



The optimal estimator $7 for a quantity 7, from a set vr of quantities (that we will 
call the variables) and a norm ||.||, minimizes ||f7(7r) — 7||. Optimal estimators are 
typically defined for the L2-norm (quadratic error) and then $7 minimizes ||r2(7r)— 7IP = 



A model is a function (7(vr) which aims at approaching 7 (and thus ^^(vr)) as closely 
as possible. In a recent workQj Langford and Moser firmly asserted that the quadratic 
error is the relevant error to consider in LES. This point of view is here adopted without 
further discussion and therefore the quadratic error will be retained throughout the 
paper as the relevant criterium for assessing the quality of a subgrid model. This 



2 Properties of optimal estimators 
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section surveys and shows some properties of optimal estimators in relation with the 
quadratic error. 

The quadratic error made by g{Tr) when estimating 7 is 

E, = {{^ - 9{7r)f) . (1) 

The quadratic error satisfies the following orthogonality relation (see the proof in 
appendix A) 

((7 - 9{^)?) = ((7 - (7l vr))^) + (( (7K) - 9{^)f) ■ (2) 

Since the last term in the RHS of equation|21is the expectation of a positive quantity, 
it is positive. Thus for any g 

((7-5(vr)f)>((7-(7|vr))'). (3) 

This relation means that any subgrid model g, built on the set of variables vr, will 
lead to quadratic errors larger than the one made by the conditional expectation (7I vr). 

We will call the error made by the conditional expectation the irreducible error 
made by estimating 7 using vr, since no model using vr as variables can make a smaller 
error. The last term in can be written 

(((7K)-g(^))')= y"((7|vr)-5(^))2p(^)dvr. (4) 

It is equal to zero only if g{Tr) = (7I vr). This means that the conditional expectation 
is the unique best model[7j. For the quadratic error, the optimal estimator r2(7r) using 
vr as variables is thus the conditional expectation (7I vr). 

Let tt' be a set of variables that can be computed using vr. The optimal estimator 
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for vr' is thus implicitly a function of vr, so that equation © becomes 



((7-(7|vr'»')>((7-(7|vr)f) 



(5) 



The irreducible error associated with vr' is then always greater than the irreducible 
error associated with vr. 

The optimal estimators verify another property that is called the "successive con- 
ditioning" 



The proof of this property is given in appendix B. When considering a given model, it 
is common to draw a cloud of points with 7 on the y axis and (7(vr) on the x axis 1310] ■ 
For a given value a of (/(vr), it is possible to compute the mean of 7 for all the points 
that are close to a. This can be done for all the values of a and drawn on the figure(i.e., 
moving averages). This is a way of representing (7I g(vr) = o), which is a function of a. 
The successive conditioning of the optimal estimators means that if ^(vr) is an optimal 
estimator, then all the points computed as described should be on the y = x line. 

Provided they can be computed, optimal estimators (i) allow to know if a given 
model is far from the optimal estimator by comparing the error it makes to the irre- 
ducible error; (ii) suggest ways of improving models just by representing the optimal 
estimators when this is possible and (iii) allow to compare different sets of variables 
quantitatively, by computing the irreducible error for each set. 



3 Practical computation of optimal estimators 



Now that the properties of optimal estimators have been presented, we will turn to the 
practical computation of optimal estimators using data: optimal estimation. Optimal 
estimation from data in non-parametric frameworks (i.e. without prior knowledge of 
the function to be approximated) is a wide area of research consisting in designing 
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algorithms, termed learning algorithms, that use data (7ri,7i), . . . , (7r„,7„) to compute 
approximations fn of the optimal with some nice convergence properties of fn to O 
as n — > oo. 

The main usual hypothesis is that the data are independent and identically dis- 
tributed. Various results of Universal Consistency (UC), i.e. asymptotic convergence 
towards the optimal function in L^-norm, have been proved for various techniques; 
histogram-rules ([Sj), fc-nearest neighbours [Sj, neural networks ( 13 ), gaussian support 
vector machines (^0]); various general results using VC-theory include wide families of 
methods There is no possible universal convergence rate; a method is better or 

worse than another depending on the distribution of the examples. However, various 
heuristics for choosing between various learning-algorithms are well-known: support 
vector machines are often efficient for generalizing from very small samples or when 
relevant kernels can be defined, fc-nearest neighbours only need a metric, histograms 
are simple and interpretable, neural networks do not work well in huge (non-sparse) 
dimensionality but can deal with very large numbers of examples. 

In consequence, two techniques have been chosen here, in the framework of L^- 
norm (quadratic error) using large DNS data. The first one is the most intuitive and 
is based on the fact that optimal estimators are conditional expectations. This is the 
"histogram technique". The second one is based on the fact that optimal estimators 
minimize the quadratic error. It uses neural networks. 

First, a conditional expectation can be approximated by a piecewise-constant 
function (a histogram). The vr space (whose dimension is the number of variables of 
the model) is discretized in small cells. Each data point (a value of 7 and of vr that 
has been produced by a DNS) belongs to a given cell of the vr space. Let us consider 
the piecewise-constant function which associates to each cell the mean of 7 for all the 
data points belonging to the cell. This function is an approximation of the optimal 
estimator. When you have vr, you can take as an approximation of (7! vr) the value of 
the previous function for the cell corresponding to vr. 
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The main difficulty is tlie clioice of the size of the ceh. If the size is too big, 
the piecewise-constant function will obviously not be a good approximation of the 
conditional expectation. If the size of the cell is too small, too few points will be 
contained in each cell and the value associated with each cell will not be reliable. 

In order to overcome this difficulty, one has to divide the data into two parts. The 
first one is used for the computation of the piecewise-constant function. Then the error 
made by the piecewise-constant function when estimating 7 for the second part of the 
data will be computed. This error is the generalization error. The relevant size for 
the cells is the one for which the generalization error is minimized. Figure ^ shows the 
generalization error in function of the number of cells for a given range. 

[Figure 1 about here.] 

The optimal estimators can be approached using neural networks instead of 
piecewise-constant functions. A neural network can be seen as a parametric func- 
tion. For a perceptron with a single hidden layer ll2j, this function can be written 



where is the number of neurons in the hidden layer, A^^ is the number of variables 
in vr and vr^ is the k^^ parameter. In the NN-terminology, the parameters Bj^ are 
called the weights of the first layer, the Aj are the weights of the second layer; a and 
the bj are the thresholds. By adjusting the weights of a neural network, it is possible 
to approach almost any function. Formally, neural networks with one hidden layer 
of neurons have the universal approximation property, i.e. they can approximate any 
measurable function for the L^-norm, and they have the statistical consistency, i.e. 
this convergence occurs with probability one when the network is trained from data if 
the number of neurons increases DroDerlv|13j. A typical neural network is represented 
figure 12 
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As previoulsy, the data is split into two parts. Using the first part, the neural net- 
work is trained: the weights are adjusted so that the error made by the neural network 
is minimized. This means that the neural network is a numerical approximation of the 
optimal estimator. The learning is made using a back-propagation algorithm |14[ [T^ . 

Then, the generalization error is computed. The generalization error is the quadratic 
error made by the neural network on the rest of the data (on the data that have not 
been used for choosing the weights). The number of hidden neurons is chosen in order 
to minimize this generalization error. 

Neural networks allow to compute optimal estimators for a number of variables 
which is greater than 3. Let us just stress that neural networks are usual tools for 
pattern recognition, estimation of conditional expectations, density estimation \V2\ . 
They have been applied in various areas of physics f |15lll6j ^. even in fluid mechanics jl7j. 
Other forms of statistical learning tools derived from neural networks have also been 
experimented, particularly Support Vector Machines f jl8lll9p . We have chosen neural 
networks because Support Vector Machines, at least in their most standard form, do not 
use the mean square error, and therefore are not conditional-expectation estimators, 
whereas standard neural networks arejJSI- However, less standard forms of Support 
Vector Machines could also be used [20]; but SVM are much slower than neural networks 
for large data sets such as the ones we will use further on. 

[Figure 2 about here.] 

The results obtained using neural networks are presented section Q 

4 Filtered Density Functions 

The case of a scalar field c(x) advected by turbulence is now considered (c(x) repre- 
senting for instance a temperature or the concentration of a chemical species). The 
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large scales of the scalar field are denoted by c(x) defined as 



-c{x) = ^ I c{x',t) dx' (8) 
-Id{x) 



where D{x) denotes a cube of edge-length h, centered in x. The t filter considered here 
is then a box filter in the physical space. It is a positive filter: the filtering can be 
written as a convolution c = G * c of the scalar field by a function 

Gi-) = ^sflH(^-\-'^\) (9) 



1=1 ^ 

which is positive (see appendix C). Here, H is the Heaviside function. In the Fourier 
space, the filter corresponds to a product of the Fourier transform of the scalar fiuctu- 
ation by 

± /I \ 

(10) 



G{k) = fl sine (^^h 



In this article, the scalar field is bounded: < c{x) < 1. As already pointed out [HI, 
the large eddies are bounded in the same way only if G > and J G = 1. 
We now consider quantities that can be written as: 



/(c)(x) (11) 

where / is a function. It is not specified here, but /(c) represents a quantity that 
is important for the simulation and whose filtered value requires closure: a (simple) 



chemical reaction term for example. The quantity /(c) (x) only depends on the Filtered 
Density Function (FDF) which is defined PT] by 



ds{C, x) = 6(C- c(x*')) (x) (12) 
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The link between the FDF and the quantity of interest is the following relation pT| 

Jic){x) = j /(C)d.(C,f)dC (13) 
which means that the knowledge of d^ (which is sometimes called the Subgrid PDF) 



allows to compute /(c) whatever / is. The FDF is not a statistical quantity: it is 
defined for a given realization of the flow and for a given x. Now that this has been 
underlined, the space dependence of the FDF will be omitted in the following - as for 



/(c) or c. 

The mean (on the cube D{x) and not in the statistical sense) of the FDF is its first 
moment. It is simply equal to c since 



j xds{x)(ix = j X 5{x — c) dx = c. (14) 
The variance of the FDF will be called the subgrid variance 

al= f {x-cfds{x)dx = '^ -c^. (15) 



For a given cube D = D{x), let us consider the proportion of the cube which 
contains a scalar field bounded above by C. It will be noted Vd{C) and can be written 

Vd{C) = ^jH{C- c(x)) df , (16) 
where H is the Heaviside function. We then have the following relation 

dVn 



^ / ^H{C-c{x))dx 
d{C -c{x))dx = ds 



dC /i3 
1 



The FDF thus gives information about the distribution of the values of the scalar field 
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in the cube D{x) since Vd{C) = /_oo ^«('^) Points can be chosen randomly in 
the cube D{x). A histogram made using the values of the scalar at these points will 
approach the FDF. This gives a way of computing the FDF provided that the field is 
known everywhere in the cube D{x). 

We used data from a pseudo-spectral DNS with periodic boundaries. In such a 
simulation, the evolution equation of the flow is solved in the Fourier space. The fields 
of the velocity and of the scalar are then defined by their first Fourier modes. For the 
scalar, this can be written 

where ojq = ^ {L being the size of the simulation domain) where k is the number 
of modes retained and the ajj',fc coefficients are the amplitudes of the Fourier modes. 
The Fast Fourier Transform (FFT) computes the field at special points (grid nodes) 
because (i) there is a mapping between the values of the fields at the grid nodes and 
the amplitudes of the Fourier modes and (ii) equation ()17|) can be factorized, so that 
the computation of the field at the grid nodes is easier. But let us stress the fact that 
the field is defined everywhere in the physical space by ()17() . It can be computed for 
an arbitrary x by summing the contributions of the different modes at this particular 
point. This operation is of course very costly compared to a FFT, but it is necessary. 
We have tried to approximate the field between the nodes using a simple interpolation, 
which requires less computation time, but the results are not satisfactory. The FDF 
computed using interpolation substantially differs from the one computed using the 
rigorous formula ()17() . 

The box filter (jSJ cannot easily be computed in the physical space. On the contrary, 
it is simple to perform in the Fourier space, since the amplitudes of the Fourier modes 
just have to be multiplied by a function given by UlUj) . This is a rigorous method in 
the sense that the obtained field exacly satisfies © in the physical space, reflecting the 
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fact that the field is indeed implicitly defined everywhere in the physical space when 
the Fourier modes are known. 

The DNS we have performed use a particular injection method for the scalar 
field. Periodically in time, large cubes are chosen randomly in the simulation do- 
main. "Fresh" scalar is injected in these cubes: in half the cubes the field is put to zero 
and in the other half it is put to 1. A view of the scalar field is shown figure 01 It has 
to be stressed that the scalar fluctuation always satisfies < c < 1. The characteristics 
of the simulations are detailed in |22j . 

[Figure 3 about here.] 

Figure 0] shows several FDFs with extremely close means and variances. The cubes 
have been chosen so that c = 0.5 ± 0.01 and al = 0.0055 ± 0.0001. 

[Figure 4 about here.] 



5 FDF modelling 

In the framework of LES, the FDF can be estimated either by solving an equation 
governing its evolution [23113] or by using a model for the FDF. The model proposed by 
Cook and RileyjSl uses a presumed form for the FDFs. It has drawn much attention 
and has been the subject of several studies |Hll24j. In this model, the main assumption 
is that the FDF can be approximated by a /3 distribution with same mean and same 
variance as the real FDFs (i.e. the same first moments). The definition of the [i 
distribution is as follows: 

with 
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r(a) being the gamma function of Euler. 

In order to have the right mean and variance, a and b must be chosen so that 



- ,'c(l - c) i\ A I. ^ 
a = c [ 7^ 1 and o = a. 



The presumed form for the FDF can be used in equation (|13|) . This provides a model 
for /(c) whatever / is, using c and o"^ as variables. The estimator of /(c) can be written 



f{x)p{x;c,a^,) dx. (20) 

Since the definition of the variables is crucial for the optimal estimators, we will pay 
much attention to the often implicit choice of the fundamental variables. Here c and 

are the variables. Hence we will denote vri = |c, CTg} this first set of fundamental 
variables. The choice of (3 distributions will be called the "/3 assumption". 

The subgrid variance o"^ cannot be computed using the large eddies c only. A sub- 
model is thus necessary so that the (3 assumption can be used. Let us denote 7 a test 
filter of a characteristic size twice as big as for t. Cook and Riley assume that the 
subgrid variance is proportional to the quantity 



a 



This estimation is used to approximate the FDF and the whole model thus provides 
an estimation of /(c) based on the variables c and a only. We will denote tt2 = {c, a} 
this second set of variables. 

Another sub-model has been proposed by Pierce and Moin[2n]. They assume that 
CTg is proportional to the modulus of the gradient of the filtered scalar, which we 
will denote e-c = {dic)'^. In the following, we will denote 7r2 = {c, Ec} the variables 
corresponding to this modelization. 
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6 Validity of the p assumption 



Our purpose is to know if there is an optimal choice for the presumed form of the FDF. 
Of course this optimal choice depends on the variables vr used for the model. The fact 
that there is an optimal choice is not obvious. The optimal estimators used in the first 
part are defined in the case where a scalar quantity 7 has to be estimated using vr. 
Here the model provides a function for each different value of vr. No relevant measure 
of the error can be defined in this case. But we have the relation 



This means that (d^ | vr) is the optimal estimator for the approximation of the FDF 
using vr since when it is used in equation (|13() . the estimator of /(c) which is obtained 
is the optimal estimator for /(c) using vr. In this particular case the optimal estimator 
concept can therefore easily be extended. This quantity has already been considered in 
the case where the variables are c and a^. It has been called the conditional FDF|26). 
It is sometimes refered to as the FPDF|77]. 

This optimal estimator can be computed using the following natural method: first, 
grid nodes are chosen for which vr has a value very close to an arbitrary given one, then 
the FDF is computed for each of these nodes, and finally, all the FDFs are averaged 
to obtain the optimal estimator. 

Let us consider the set of variables vri = {c, cj^}, which is the case when the subgrid 
variance is known. Comparisons between beta distributions and the optimal estimators 
are shown for three sets of values of the variables in figure El The cubes which are 
selected for the computation of the optimal estimators are chosen so that c = 0.5 it 0.01 
and 0-2 = 0.0055 ±0.0001 for the first comparison, c = 0.25 ±0.01 and fi^ = 0.01 ±0.001 
for the second one and c = 0.1 ± 0.01 and = 0.01 ± 0.001 for the last one. 




(21) 
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The correspondence between the /3 distributions and the optimal estimators is ex- 
cellent. The /? distributions can thus be considered as a very appropriate presumed 
form for the FDF, as long as is known. We must point out that this does not 
mean that the FDFs are actually f3 distributions 0, as shown figure 01 We agree [S] 
that the /? assumption seems to be appropriate for any subvolume. We think that a 
mathematical property of /? distributions could explain these results but we were not 
able to find it. Here, the size filter is about four grid nodes and has been chosen so 
that all values of c, (or a) are well represented in the statistical sampling process. 

[Figure 5 about here.] 

Let us consider the case when the variables of the model are c and a. Figure 
shows examples of FDFs for cubes which present the same values of c and a. When 
compared to figure |3 it is observed that there are much larger differences between the 
FDFs. This is due to the fact that for given c and a, different values of are observed. 
This is what we will call the subgrid variance dispersion. For a given vr the subgrid 
variance dispersion can be quantified by the irreducible error made when estimating 

using vr. When for instance this error is small, the subgrid variance of cubes with 
very close vr values will be very close to | vr). 

[Figure 6 about here.] 

Figure Q the optimal estimators are compared to a /3 distribution whose variance 
is (o"g|c, a) (the average of for all the FDFs). The cubes which are selected for 
the computation of the optimal estimators are chosen so that c = 0.5 ± 0.01 and 
a = 0.0125 ± 0.0005 for the first case, c = 0.355 ± 0.005 and a = 0.01 ± 0.001 for the 
second case, and for the last one c = 0.1 ± 0.01 and a = 0.001 ± 0.001. 

[Figure 7 about here.] 

In this case, the /? distributions are not very close to the optimal estimators. This 
is due to the variance dispersion. This means that there is a better presumed form 
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when a is used - closer to the opthnal estimator. However, this form would be relevant 
for vr = {c, a} only. 

When choosing a set vr with less subgrid variance dispersion, the form of the optimal 
estimator must tend towards the form of the optimal estimator when cj^ is known (when 
the dispersion is null). Hence, we conclude that the /3 assumption is better when the 
subgrid variance dispersion is small. 

Now that we have established that, when cr^ is unknown, /? distributions are not as 
clearly appropriate as when the subgrid variance is known, the question is: is the error 
due to the difference between the optimal estimator of the FDF and the /3 distribution 
a significant one ? 

Using optimal estimators, it is possible to compute the supplementary error brought 



by the /3 assumption alone for the estimation of /(c). This must be done for a given 



Let us consider the following estimator for /(c) using vr: 

5/(vr) = y /(C) a (C; c, (a^j ^» AC. (23) 

It is obtained by replacing d^ in ()13|) by a /3 distribution. The variance of the /3 
distribution is the optimal estimator of cr^ using vr. The error made by this estimator 

vr ) . The supplementary error 



can be compared to the irreducible error made by \J{c) 
made by the estimator H23|) reflects the fact that /3 distributions are not exactly the 
optimal estimators as shown figure [3 (or figure El even if the difference is slight). 

We will now present results that have been obtained using histograms. All the 
quadratic errors have been normalized by the variance of the quantity which is esti- 
mated. This allows to compare two errors even if the quantity to estimate is not the 
same. 

The results concerning the estimation of are presented in table ^ concerning 
the estimation of a^. The irreducible error that is computed here reflects the subgrid 
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variance dispersion. This dispersion is smaller when the variables are c and €c than 
when the variables are c and a. 

Since the error brought by the (3 assumption must be computed for a given / we 
have chosen to do so for several f3 distributions with different means and variances. 
This is a very plausible choice[25.^. The comparison between the optimal estimators 
and the estimator 5/(7r) is presented in table El Column one is the error made by the 
optimal estimators, i.e. the irreducible error. Column two is the error made by ^/(vr), 
i.e. when using the P assumption. 

It can be observed that the supplementary error due to the /3 assumption is always 
much smaller than the irreducible error. In most cases, it is one order of magnitude 
smaller. This confirms the previous results on the optimal estimators of the FDF. 

The fact that the /? distributions are not very close to the optimal estimators of 
the FDFs has not a measurable incidence on the supplementary error. The latter can 
often be neglected and there is no need to search for a more relevant presumed form 
for the FDF. 

Since the irreducible errors are normalized, they can be compared. The conclusion is 
that is a quantity which is very difficult to estimate. The quantity /(c) is in general 
easier to estimate. In addition, the better cj^ is estimated using a set of variables, the 
better the /(c) are estimated. For a few variables, the relative relevancy of a set does 
not depend on /. 

[Table 1 about here.] 
[Table 2 about here.] 

7 Neural networks 

As already underlined 0, the estimation of the subgrid variance is the main problem 
in the presumed FDF approach. Rather complex models have been proposed|6j| using 
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dynamic approach and many new variables. Table El shows the irreducible error for 
different sets of variables computed using neural networks. One of these variables (c) 
has often been neglected in the different models proposed. The results shown here 
suggest this is a mistake. On the contrary, the results show that the dissipation e does 
not bring much information about cr^. 

[Table 3 about here.] 

In addition, they show that the more variables are included in vr, the better the 
estimation of the subgrid quantities - with errors much smaller than with usual models. 
Neural network can thus be considered as a new way towards optimal LES in the sense 
of Langford and Moserp^. We think that they could be used directly for the modelling 
of unknown terms in LES and that they present some advantages: they do not need 
much data to estimate the conditional expectations correctly and they can reproduce 
highly non linear behaviours. It is possible to take physics directly into account when 
choosing the variables (galilean invar iancePOj, scale invariance or dimensional analysis 
arguments). 

8 Conclusion 

In agreement with Langford and MoserJ^, we have used and explored the idea that 
in the framework of LES, once a set of variables has been chosen to estimate a given 
subgrid term, there is only one optimal estimator. Some of the properties of optimal 
estimators that are of interest for LES have been presented throughout this article. We 
have shown how optimal estimators could be computed using simple techniques if the 
number of variables is small, or neural networks if the number of variables is greater 
than 3. 

As an illustration, the concept of optimal estimator was applied to the Cook and Ri- 
ley subgrid model[S] for the scalar fluctuation, a model which is widely used and whose 
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basis has already retained much attention in the hterature. The concept of optimal 
estimator has been extended to the approximation of FDF and the main conclusion is 
that the /? assumption is very appropriate. When the error directly associated with the 
(3 assumption for the estimation of a given quantity is compared with the irreducible 
error, it is found to be small. The largest error are associated with the estimation of 
the subgrid variance of the scalar fluctuations. That is the very point on which the 
Cook and Riley model needs to be improved. 

In the paper we did not attempt to propose any new practical model for this scalar 
variance, but we used neural networks to see how closely the subgrid variance could 
be estimated. The error made by the optimal estimator is smaller when the number of 
relevant variables is increased. 

The optimal estimation technique provides a way of assesssing which set of param- 
eters will potentially lead to the most accurate subgrid model. It does not provide any 
information on how to specify the formulation of the model, but simply indicates if 
efforts for building a model with a given set of parameters are likely to be fruitful. 

As the number of retained parameters is increased, it can become more and more 
difficult to propose a model formulation on the ground of physical reasoning, and this 
might suggest using the optimal estimator directly instead of a model. Then, neural 
networks would be used directly - instead of a modelization. This "NN guided LES" 
could constitute an optimal LES as defined by^ |2H]- We did not perform such a 
simulation in the present paper. This could be the subject of a future study (following 
the path explored by |1H|)- 

This paper is an illustration of the relevance of optimal estimation techniques for the 
problem of subgrid modeling for LES. These techniques allow a thorough exploration 
of the behaviours of models in LES indicating the points which have to be improved 
in the models. 



19 



Acknowledgements 



Olivier Teytaud is supported in part by the Pascal Network of Excellence. The authors 
would like to thank M.F. Moreau for a careful reading of the paper. 

A Orthogonality relation 

Let us give a short demonstration of ©• The quadratic error made by an estimator 
g^n) can be developed: 



Since (7I tt) = J 7p(7|7r) d7, the previous quantity is null and the orthogonality relation 
holds. 



((7-5(vr)f) = ((7- (7k))') + (((7l^)-5(vr)f) 
+2 ((7- (7lvr))((7lvr)-5(^))) 



Now, we have to show that the last term is null. It can be written 
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B Successive conditioning 

Let ^(Tr) be an estimator of 7 using tt. We will note p{g{Tr) = g) or p{g) the PDF that 
^(Tr) = g. Then we have 



{n\g{'K) = g) p{g{-n) = g) = J'jp{-f,g)dj 

= J 7P(7>7r,5)d7d7r 



J 7P(7,7i")(^(5f(7r) -5)d7d7r 

J {^j 7;'(7|7r)d7^ p{'n) 5{g{'n) - g) d-K 

J (7|7r) p{ir)6{g{ir) - g)dir. 



And if g{'K) = (7I tt) then 



(7I (7I tt) = 5) = J (7|7r) p(7r)(5((7|7r) -5)d7r 

= g J PiT^)S{{7\Tr) - g)dTr 

= 9 J P(vr,5)d7r 

= gpig) 



Hence, (7I ^(Tr) = g) = g, which can be written 

(7I (7|7r)) = (7|7r) . 

C Filtering 

The filtering can be written 

c(r) = / G{f—x)c{x)dx 
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or c = G * c. Let us assume that we have < c < 1 and that we want 

< c < 1. (24) 

If G{x) > Vx we can write 

< c < T. 

If G is not positive, the inequahty can be false. Thus it is necessary that G should be 
positive. In order to have (^1]) verified, G must have the property 1 = 1, which can be 
written J G = 1. 

Let us now define Vd{C) = H{C — c{x)). The physical meaning of this quantity is 
not as clear as in the case of the box filter. Anyway, the FDF is still the derivative of 
Vd- We can write 

Voia) - Voib) = H{a-c)-H{b-c). 

If a > 6 then H(a — c(x) — H(b — c(x)) > 0. // the filter is positive, then Voia) > Voib) 
and Vd is a growing function. If G is not positive, this is not granted. Since the FDF 
is the derivative of Vd, it is positive only if the filter is positive. Finally, we have 
^d(I) = 1 = / ds, so that the FDF is normalized only if J G = I. 

The filters are generally chosen so that J G = 1. The previous arguments show that 
they should be chosen positive, too. The box filter has been chosen for historical E] 
and clarity reasons. But it is an invertible filter, whereas the filters that should be 
considered in the framework of LES should be non-invertibleTP. All the non- invertible 
filters that have been proposed are not positive. We propose here a new filter that 
could be more relevant. It is defined by 

G(-) = n^-c^f (25) 

i=l 

This filter is non-invertible (since the Fourier Transform of sinc'^ is a triangle function) , 
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normalized and positive. We think that the choice of a particular filter has not much 
importance while the error made by the estimators is large. For very small errors, a 
difference could probably be found between the error made by the estimators when 
using an invertible filter and the error when using a non-invertible filter. This could 
probably be seen using neural networks when computing the irreducible error. 
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Figure 1: Typical generalization error versus the number of cells for each parameter. 
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Figure 2: Representation of the neural network used to approximate the optimal estimators 
in the case where the number of neurons in the hidden layer is 3 and with two variables only. 
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Figure 4: Comparisons between a j3 law (solid line) and several arbitrary chosen FDFs, 
same mean (c = 0.5 ± 0.01) and variance {al = 0.0055 ± 0.0001) 
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Figure 5: Comparisons between {ds{C) \ c, af) and /3(C; c, o"^) for (i) c = 0.5 and = 0.0055 
(ii) c = 0.25 and = 0.01 (iii) c = 0.1 and = 0.01. 
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Figure 6: Comparisons between a [3 law (solid line) and several arbitrary chosen FDFs, for 
c = 0.5 ± 0.01 and a = 0.0125 ± 0.0005. 
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Estimator Normalized error 







0.215 




c, a) 


0.259 




- Ka 


0.359 



Table 1: Normalized quadratic errors for several optimal estimators of cr^ (the optimal 
estimators are computed using the histogram technique). The constant k, is chosen so that 
the error made by the Cook and Riley estimator of is minimized. 
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Variables (tt) Error of ^/(c) 


TT^ Error of 5'/(7r) 


Mean 0.35, variance 0.01 


c, al 0.043 


0.051 


c,ec 0.095 


0.099 


c,q; 0.136 


0.140 


Mean 0.15, variance 0.01 


c, al 0.031 


0.042 


c,ec 0.055 


0.070 


c,a 0.072 


0.091 


Mean 0.5, variance 0.01 


c, 0.041 


0.063 


c,ec 0.092 


0.107 


c,a 0.130 


0.146 


Mean 0.5, variance 0.036 


c, 0.011 


0.013 


c,ec 0.064 


0.066 


c,a 0.079 


0.081 



Table 2: Normalized quadratic errors of different estimators for the estimation of /(c). The 
mean and the variance of the different / that have been chosen are specified. 
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Variables 


Irreducible error 


c, a 


0,259 


c,ec 


0,215 




0,192 


c, a, e 


0,258 


^) ^C) 


0,173 




0,158 


C, Of, 


0,152 


c, CI, €c, eg, e^; 


0,137 



Table 3: Irreducible error linked to different parameter sets. 64^ examples are used for eval- 
uating the weights of the neural networks and (128'^ — 64^) examples are used for estimating 
the irreducible error. 
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